In Chapter 1, you started on your journey with R. You installed R and RStudio. You downloaded and installed a number of packages that R needs to carry out the tasks of this course. You learned about assignment. Now that you have seen that you can accomplish important tasks in R without direct training in the details, let us return to the beginning and cover some basic operations and some basic computer programming concepts that will give you the basis for the statistical and machine learning tasks you will want to complete.

Basic Operations in R

R as a Calculator – Math Operations

You can do arithmetical calculations directly in the R Console. You will use the same syntax as you will use in writing expressions in your scripts. At the system prompt (“>”), go ahead and write the expression you would like to calculate. Let us try first adding 5 and 7.

5 + 7
## [1] 12

The answer appears right below what you typed, 12. You can also subtract, multiply and divide. Follow along with these examples in your own copy of R.

287 + 9875
## [1] 10162
27 - 12
## [1] 15
35.61 * 28.3
## [1] 1007.763
27865.35/762
## [1] 36.5687
25^2
## [1] 625

The last operator was ^, which is used to raise the number to the left of the operator to the power on the right side of the operator. In our example, it is 25 to the 2nd power, or 25 squared.

You will notice the [1] before the answer. This simply indicates that the answer is itself a vector, which has 1 element only. The [1] is the index number of the element of the vector. We will soon see cases in which answers for calculations can have multiple values, in which case, 1 is not the only index value possible.

R as a Calculator – Math Functions

R can also calculate expressions that cannot be expressed by one of the four classic mathematical operators. There is a wide range of possible functions, of which the following are commonly used in statistics and the biological sciences:

Function What It Does
abs(x) absolute value of x
sqrt(x) square root of x
log(x) natural (Naperian) logarithm of x
exp(x) natural exponent of x
log10(x) logarithm base 10 of x
round(x, n) round x to n decimal places

Here are some examples of these functions1 at work.

abs(-25) 
## [1] 25
sqrt(2798)
## [1] 52.89612
log(377898)
## [1] 12.84238
exp(12.84238)
## [1] 377898.2
log10(377898)
## [1] 5.577375
round(exp(12.84238), 0)
## [1] 377898

These examples also show some of the quirks of mathematical functions calculated on the computer. We took the natural logarithm (that is, the logarithm to base \(e\) of an arbitrary number (377898). R reports the result as 12.84238. However, this result is rounded by R to five decimal places. Extended to 10 decimal places, it is 12.8423795969. The basic definition of a natural logarithm implies the equation:

\[log(x)=e^x\]

However, when we calculate whether these two quantities are equal, they are! The reason why they do not appear equal is simply the number of decimal places that R uses to report the result in the Console. However, as the following figure shows, it stores the correct result in its memory. The difference is rounding.

x <- 377898
y <- log(x) # calculate the log of x and assign it to y
exp(y)
## [1] 377898

Comments

VSS You will notice the second line of the above code has a comment after it. Comments are very important in programming and in statistical models. They give us information about what we have done and sometimes why we did it. A comment begins with a hashtag (pound sign) (#). The hashtag and everything that follows it on a line are not interpreted by R. You can write whatever you want there. If you have a comment that will spill over to a second line, that is okay. Just start the second and all succeeding lines with a hashtag. Your code should be heavily commented or you will be cursing at yourself in six months when you forget why you used certain code. And, you will forget. We all do.

Order of Calculation

You may remember from high school that arithmetic calculations must be done in a certain order. If you simply type 5 * 7 + 2, a computer won’t know whether you mean add two to the result of 5 times 7, which has a result of 37 or 5 times the result of 7 plus 2, which has a result of 45. So, there are rules. Computers will calculate arithmetic expressions in a given order. The following table shows that order, which is known by the mnemonic “PEMDAS”.

Operation Symbol Example PEMDAS
parentheses () 5 * (7 + 2) = 45 P
exponents ^ 5^2 = 25 E
multiplication * 5 * 7 = 35 M
division / 25/5 = 5 D
addition + 5 + 7 = 12 A
subtraction - 5 – 7 = -2 S

Just to help this sink in, let us consider again the alternative meanings of the example I started this section with.

5 * 7 + 2
## [1] 37
5 * (7 + 2)
## [1] 45

The natural order, without parentheses is to do the multiplication ahead of the addition (M before A in PEMDAS). In the second version, the parentheses mean anything within them will be calculated first (7 plus 2) and that sum multiplied by the initial number, 5.

Assignment (Again) – Naming Variables

R imposes a few rules to the names you assign to values. These names make those values into variables because over time their value can change. It is easiest to state the rules positively: what you are permitted to do. After this principal rule, I’ll show some non-permitted types of names.

Primary Rule for Variable Names

Variable names must contain only letters (either upper or lower case), numbers and the symbols . and _. You must start a variable name with a letter.2

Some corollaries to this rule:

  • Variable names cannot include spaces. We use “snake case” (connecting words with underscores (_)) to overcome this restriction. So, viral load is not a permitted name, but viral_load is.

  • R has a number of reserved words that cannot be used for variable names. They all have specific meanings in R and will confuse the interpreter3 if you attempt to use them as variable names. They include: if, else, repeat, for, while, function, in, next, break, TRUE, FALSE, NULL, Inf, NA, NaN.

  • Variable names in R are case sensitive. Therefore, variable and Variable are two separate variables as are x and X.

Snake_case-camelCase

Snake_case-camelCase

Variable Name Customs and Recommendations

  • Make your variable names clear and informative. Calling a variable x will not help you remember what your variable is and what kind of values it should have. Spending a bit of space to create a longish, but clear name will save immense amounts of time in the long run.

  • Make a data dictionary. Keep a record of what your variable names are, what kind of data they should contain (e.g., character, numeric, logical) and what range of values they are likely to contain.

  • Despite what I said in the first of these points, keep your variable names as short as possible, while still keeping their meaning clear.

  • An alternative to snake case as recommended above is “camel case”, which combines the words in the variable name and makes upper case the first letter of the second and all subsequent words. So, viralLoad would be the camel case version of viral_load and you could have a variable named thisIsALongVariableName if you truly wished. Choosing between camel case and snake case is a matter of taste.

  • Almost all the above rules and recommendations can be broken if you surround your variable name with single quotation marks (‘) or back ticks (accent grave) (`). So, ’viral load’ or `viral load` become legal variable names despite the spaces.

VSS Try, really try hard, not to use this last option because it makes your code hard to read and hard to edit. Snake case is very much the preferred alternative.

Are There Ways to Avoid Typing So Much?

R wants to help you out with its commands and functions. There are really three methods that R uses to offer you help.

RStudio Help

In addition to the help screens in the Help tab, you can start to write a function name, hit TAB and R will tell you what the arguments are for that function.

RStudio Function Help

RStudio Function Help

If you begin to type the function name round, R will prompt you with the various functions possible and other relevant information.

Autocompletion

RStudio also uses the TAB key to complete the entry of a function. So, typing roun as in the figure and then typing TAB will complete the entry of the round() function.

Command History

RStudio stores all the commands you have typed. If you want to use a command as you have previously typed it or use it as a model for editing, go to the History tab in the upper right pane, choose the command you would like to use (among the more than 200 that it stores) and either direct it to the Console or to the file you are working on in the Source pane (upper left) with one of the buttons in the tab’s menu bar.

RStudio Command History

RStudio Command History


  1. When we execute a function in R, much of the literature refers to “calling” the function. It just means to execute it.↩︎

  2. There is an exception to this rule that affects R system variables. Internal variables can start with a period ., but these variables are only useful to R developers, those programmers who are building packages to provide R with new capabilities.↩︎

  3. R is an interpreted computer language, which means that when you instruct R to run a script or execute a line, it immediately does so by translating the code into machine language that the processor understands. This contrasts with a compiled computer language, like C or FORTRAN, that produces first an intermediate machine intelligible version of each piece of your program when you compile it and then joins those pieces together in an executable file (.exe in Windows) that is what you run. R itself or any of the Microsoft Office programs are examples of this type of computer program.↩︎